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Background 

Haplotype inference is an essential stage in genetic linkage 
analysis and estimation methods are also very frequently 
used to reconstruct haplotypes in current genetic associa- 
tion studies. Most of the latter are focused on haplotype 
phasing from recombinant DNA areas of unrelated indivi- 
duals and use likelihood-based methods to infer the pre- 
sence of alleles in several loci with very time-consuming 
probabilistic algorithms. 

So far, literature does not analyze haplotypes using deter- 
ministic techniques, and there are hardly any alternative 
methods for constructing haplotypes from non-recombinant 



DNA areas, despite the fact that computational inference by 
probabilistic models may cause a large number of incorrect 
inferences. 

Description and results 

We have developed an algorithm called alleHap, which 
is able to impute alleles from parent-offspring pedigree 
databases with missing family members, to later 
construct their corresponding, unambiguous haplotypes. 

The alleHap algorithm is based on a preliminary ana- 
lysis of all possible combinations that may exist in the 
genotyping of a family, considering that each member, 



Table 1 Possible allelic combinations in a parent-offspring pedigree 
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Considering all allele combinations, the maximum number of "unique" children and alleles is four. 
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Computing times vs. Number of families 



Computing times vs. Number of SNPs 
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Figure 1 Representation of computing times according to the number of families (left) and the number of SNPs (right). 



due to meiosis, should unequivocally have two alleles, 
one from each parent. The analysis was founded on the 
differentiation of seven cases, as described in [1], but 
some of them divided into a maximum of three variants, 
representing a different combination of alleles of the 
family members (Table 1). 

The classification by cases and variants allows the 
algorithm to impute missing values efficiently in the 
loaded database to proceed afterwards to the conformation 
of corresponding unambiguous haplotypes. Furthermore, 
the algorithm allows the construction of haplotypes, with- 
out any limitation in terms of the number of SNPs, i.e. 
enables the construction of haplotypes of more than two 
SNPs. 

By analyzing all possible combinations of a parent-off- 
spring pedigree in which parents may be missing, as 



long as one child has been genotyped, theoretically an 
unequivocal imputation of three possible parent haplo- 
types is possible in 92.3% of cases even when one parent 
is missing. When neither parent has been genotyped, in 
36.4% of cases at least two haplotypes can be con- 
structed. Regarding offspring allele imputation with both 
parents fully genotyped, a minimum of one haplotype 
for each child may be successfully reconstructed in 6.1% 
of possible cases. 

Evaluation of the results (Figure 1) reveals an optimum 
performance of alleHap computational tasks, namely 
Simulation, Imputation and Reconstruction. Their corre- 
sponding execution times are quite low even when consid- 
ering a large number of families (< 2000) and SNPs (< 50). 

Figure 2 shows how our algorithm has high allele 
imputation rates (about 65%) even when the probability 
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Figure 2 Representation of allele imputation rates (left) and haplotype reconstruction rates (right). 
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of missing parents in each family is high (>50%). Regard- 
ing haplotype reconstruction rates, there is an almost lin- 
ear relationship between reconstruction rates and the 
number of missing individuals per family. This is because 
alleHap is mainly based on the information included in 
the offspring, so the more children that are missing the 
more difficult it is to reconstruct the family haplotypes. 

Conclusions 

alleHap has been tested by simulations and also with the 
Type 1 Diabetes Genetics Consortium [2] database. Our 
algorithm is very robust against inconsistencies within 
the genotypic data and consumes very little time, even 
when handling large amounts of data. The missing data 
imputation may improve results in numerous epidemio- 
logical and/or genetic linkage studies. 

Our algorithm could be a useful instrument for informa- 
tion retrieval and knowledge discovery in genetics, since it 
would allow epidemiological specialists to discover new 
intergenic patterns by studying zero-recombinant haplo- 
types with a larger number of SNPs from family-based 
databases. 
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